AITopics | novel dataset

Collaborating Authors

novel dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Annotation to Adaptation: Metrics, Synthetic Data, and Aspect Extraction for Aspect-Based Sentiment Analysis with Large Language Models

Neveditsin, Nikita, Lingras, Pawan, Mago, Vijay

arXiv.org Artificial IntelligenceMar-26-2025

This study examines the performance of Large Language Models (LLMs) in Aspect-Based Sentiment Analysis (ABSA), with a focus on implicit aspect extraction in a novel domain. Using a synthetic sports feedback dataset, we evaluate open-weight LLMs' ability to extract aspect-polarity pairs and propose a metric to facilitate the evaluation of aspect extraction with generative models. Our findings highlight both the potential and limitations of LLMs in the ABSA task.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.20715

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(6 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalization of Video-Based Heart Rate Estimation Methods To Low Illumination and Elevated Heart Rates

Acharya, Bhargav, Saakyan, William, Hammer, Barbara, Drimalla, Hanna

arXiv.org Artificial IntelligenceMar-11-2025

Heart rate is a physiological signal that provides information about an individual's health and affective state. Remote photoplethysmography (rPPG) allows the estimation of this signal from video recordings of a person's face. Classical rPPG methods make use of signal processing techniques, while recent rPPG methods utilize deep learning networks. Methods are typically evaluated on datasets collected in well-lit environments with participants at resting heart rates. However, little investigation has been done on how well these methods adapt to variations in illumination and heart rate. In this work, we systematically evaluate representative state-of-the-art methods for remote heart rate estimation. Specifically, we evaluate four classical methods and four deep learning-based rPPG estimation methods in terms of their generalization ability to changing scenarios, including low lighting conditions and elevated heart rates. For a thorough evaluation of existing approaches, we collected a novel dataset called CHILL, which systematically varies heart rate and lighting conditions. The dataset consists of recordings from 45 participants in four different scenarios. The video data was collected under two different lighting conditions (high and low) and normal and elevated heart rates. In addition, we selected two public datasets to conduct within- and cross-dataset evaluations of the rPPG methods. Our experimental results indicate that classical methods are not significantly impacted by low-light conditions. Meanwhile, some deep learning methods were found to be more robust to changes in lighting conditions but encountered challenges in estimating high heart rates. The cross-dataset evaluation revealed that the selected deep learning methods underperformed when influencing factors such as elevated heart rates and low lighting conditions were not present in the training set.

dataset, heart rate, scenario, (16 more...)

arXiv.org Artificial Intelligence

2503.11697

Country:

Oceania > Australia (0.04)
Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
Europe > Germany > North Rhine-Westphalia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning for Wound Tissue Segmentation: A Comprehensive Evaluation using A Novel Dataset

Kabir, Muhammad Ashad, Roy, Nidita, Hossain, Md. Ekramul, Featherston, Jill, Ahmed, Sayed

arXiv.org Artificial IntelligenceFeb-14-2025

Deep learning (DL) techniques have emerged as promising solutions for medical wound tissue segmentation. However, a notable limitation in this field is the lack of publicly available labelled datasets and a standardised performance evaluation of state-of-the-art DL models on such datasets. This study addresses this gap by comprehensively evaluating various DL models for wound tissue segmentation using a novel dataset. We have curated a dataset comprising 147 wound images exhibiting six tissue types: slough, granulation, maceration, necrosis, bone, and tendon. The dataset was meticulously labelled for semantic segmentation employing supervised machine learning techniques. Three distinct labelling formats were developed -- full image, patch, and superpixel. Our investigation encompassed a wide array of DL segmentation and classification methodologies, ranging from conventional approaches like UNet, to generative adversarial networks such as cGAN, and modified techniques like FPN+VGG16. Also, we explored DL-based classification methods (e.g., ResNet50) and machine learning-based classification leveraging DL features (e.g., AlexNet+RF). In total, 82 wound tissue segmentation models were derived across the three labelling formats. Our analysis yielded several notable findings, including identifying optimal DL models for each labelling format based on weighted average Dice or F1 scores. Notably, FPN+VGG16 emerged as the top-performing DL model for wound tissue segmentation, achieving a dice score of 82.25%. This study provides a valuable benchmark for evaluating wound image segmentation and classification models, offering insights to inform future research and clinical practice in wound care. The labelled dataset created in this study is available at https://github.com/akabircs/WoundTissue.

artificial intelligence, machine learning, wound tissue segmentation, (3 more...)

arXiv.org Artificial Intelligence

2502.10652

Genre: Research Report (1.00)

Industry: Health & Medicine (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Smart OMVI: Obfuscated Malware Variant Identification using a novel dataset

Qamar, Suleman

arXiv.org Artificial IntelligenceSep-24-2023

Cybersecurity has become a significant issue in the digital era as a result of the growth in everyday computer use. Cybercriminals now engage in more than virus distribution and computer hacking. Cyberwarfare has developed as a result because it has become a threat to a nation's survival. Malware analysis serves as the first line of defence against an attack and is a significant component of cybercrime. Every day, malware attacks target a large number of computer users, businesses, and governmental agencies, causing billions of dollars in losses. Malware may evade multiple AV software with a very minor, cunning tweak made by its designers, despite the fact that security experts have a variety of tools at their disposal to identify it. To address this challenge, a new dataset called the Obfuscated Malware Dataset (OMD) has been developed. This dataset comprises 40 distinct malware families having 21924 samples, and it incorporates obfuscation techniques that mimic the strategies employed by malware creators to make their malware variations different from the original samples. The purpose of this dataset is to provide a more realistic and representative environment for evaluating the effectiveness of malware analysis techniques. Different conventional machine learning algorithms including but not limited to Support Vector Machine (SVM), Random Forrest (RF), Extreme Gradient Boosting (XGBOOST) etc are applied and contrasted. The results demonstrated that XGBoost outperformed the other algorithms, achieving an accuracy of f 82%, precision of 88%, recall of 80%, and an F1-Score of 83%.

novel dataset, obfuscated malware variant identification, smart omvi

arXiv.org Artificial Intelligence

2310.1067

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.53)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)

Add feedback

A Novel Dataset Towards Extracting Virus-Host Interactions

Alshawi, Rasha, Sen, Atriya, Upham, Nathan S., Sterner, Beckett

arXiv.org Artificial IntelligenceMay-11-2023

We describe a novel dataset for the automated recognition of named taxonomic and other entities relevant to the association of viruses with their hosts. We further describe some initial results using pre-trained models on the named-entity recognition (NER) task on this novel dataset. We propose that our dataset of manually annotated abstracts now offers a Gold Standard Corpus for training future NER models in the automated extraction of host-pathogen detection methods from scientific publications, and further explain how our work makes first steps towards predicting the important human health-related concept of viral spillover risk automatically from the scientific literature.

artificial intelligence, logic & formal reasoning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.13317

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.71)
Health & Medicine > Therapeutic Area > Immunology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.69)

Add feedback

Mask-guided BERT for Few Shot Text Classification

Liao, Wenxiong, Liu, Zhengliang, Dai, Haixing, Wu, Zihao, Zhang, Yiyang, Huang, Xiaoke, Chen, Yuzhong, Jiang, Xi, Liu, Wei, Zhu, Dajiang, Liu, Tianming, Li, Sheng, Li, Xiang, Cai, Hongmin

arXiv.org Artificial IntelligenceMar-8-2023

Transformer-based language models have achieved significant success in various domains. However, the data-intensive nature of the transformer architecture requires much labeled data, which is challenging in low-resource scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the difficulty of training robust models on small amounts of samples, which frequently leads to overfitting. Here we present Mask-BERT, a simple and modular framework to help BERT-based architectures tackle FSL. The proposed approach fundamentally differs from existing FSL strategies such as prompt tuning and meta-learning. The core idea is to selectively apply masks on text inputs and filter out irrelevant information, which guides the model to focus on discriminative tokens that influence prediction results. In addition, to make the text representations from different categories more separable and the text representations from the same category more compact, we introduce a contrastive learning loss function. Experimental results on public-domain benchmark datasets demonstrate the effectiveness of Mask-BERT.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.10447

Country:

Europe > Austria > Vienna (0.14)
Asia > China (0.05)
North America > United States > Virginia (0.04)
(2 more...)

Genre:

Research Report (0.50)
Overview (0.47)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for Computational Humor

Annamoradnejad, Issa, Zoghi, Gohar

arXiv.org Artificial IntelligenceDec-1-2022

Automation of humor detection and rating has interesting use cases in modern technologies, such as humanoid robots, chatbots, and virtual assistants. In this paper, we propose a novel approach for detecting and rating humor in short texts based on a popular linguistic theory of humor. The proposed technical method initiates by separating sentences of the given text and utilizing the BERT model to generate embeddings for each one. The embeddings are fed to separate lines of hidden layers in a neural network (one line for each sentence) to extract latent features. At last, the parallel lines are concatenated to determine the congruity and other relationships between the sentences and predict the target value. We accompany the paper with a novel dataset for humor detection consisting of 200,000 formal short texts. In addition to evaluating our work on the novel dataset, we participated in a live machine learning competition focused on rating humor in Spanish tweets. The proposed model obtained F1 scores of 0.982 and 0.869 in the humor detection experiments which outperform general and state-of-the-art models. The evaluation performed on two contrasting settings confirm the strength and robustness of the model and suggests two important factors in achieving high accuracy in the current task: 1) usage of sentence embeddings and 2) utilizing the linguistic structure of humor in designing the proposed model.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2004.12765

Country:

Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Promising Solution (0.68)

Industry: Media > News (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Improving Chest X-Ray Classification by RNN-based Patient Monitoring

Biesner, David, Schneider, Helen, Wulff, Benjamin, Attenberger, Ulrike, Sifa, Rafet

arXiv.org Artificial IntelligenceOct-28-2022

Chest X-Ray imaging is one of the most common radiological tools for detection of various pathologies related to the chest area and lung function. In a clinical setting, automated assessment of chest radiographs has the potential of assisting physicians in their decision making process and optimize clinical workflows, for example by prioritizing emergency patients. Most work analyzing the potential of machine learning models to classify chest X-ray images focuses on vision methods processing and predicting pathologies for one image at a time. However, many patients undergo such a procedure multiple times during course of a treatment or during a single hospital stay. The patient history, that is previous images and especially the corresponding diagnosis contain useful information that can aid a classification system in its prediction. In this study, we analyze how information about diagnosis can improve CNN-based image classification models by constructing a novel dataset from the well studied CheXpert dataset of chest X-rays. We show that a model trained on additional patient history information outperforms a model trained without the information by a significant margin. We provide code to replicate the dataset creation and model training.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.16074

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Atlantic Ocean > Black Sea (0.04)

Genre: Research Report > New Finding (0.35)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vehicle and License Plate Recognition with Novel Dataset for Toll Collection

Usama, Muhammad, Anwar, Hafeez, Shahid, Muhammad Muaz, Anwar, Abbas, Anwar, Saeed, Hlavacs, Helmuth

arXiv.org Artificial IntelligenceFeb-11-2022

We propose an automatic framework for toll collection, consisting of three steps: vehicle type recognition, license plate localization, and reading. However, each of the three steps becomes non-trivial due to image variations caused by several factors. The traditional vehicle decorations on the front cause variations among vehicles of the same type. These decorations make license plate localization and recognition difficult due to severe background clutter and partial occlusions. Likewise, on most vehicles, specifically trucks, the position of the license plate is not consistent. Lastly, for license plate reading, the variations are induced by non-uniform font styles, sizes, and partially occluded letters and numbers. Our proposed framework takes advantage of both data availability and performance evaluation of the backbone deep learning architectures. We gather a novel dataset, \emph{Diverse Vehicle and License Plates Dataset (DVLPD)}, consisting of 10k images belonging to six vehicle types. Each image is then manually annotated for vehicle type, license plate, and its characters and digits. For each of the three tasks, we evaluate You Only Look Once (YOLO)v2, YOLOv3, YOLOv4, and FasterRCNN. For real-time implementation on a Raspberry Pi, we evaluate the lighter versions of YOLO named Tiny YOLOv3 and Tiny YOLOv4. The best Mean Average Precision (mAP@0.5) of 98.8% for vehicle type recognition, 98.5% for license plate detection, and 98.3% for license plate reading is achieved by YOLOv4, while its lighter version, i.e., Tiny YOLOv4 obtained a mAP of 97.1%, 97.4%, and 93.7% on vehicle type recognition, license plate detection, and license plate reading, respectively. The dataset and the training codes are available at https://github.com/usama-x930/VT-LPR

novel dataset, toll collection, vehicle and license plate recognition

arXiv.org Artificial Intelligence

2202.05631

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exploiting Unsupervised Inputs for Accurate Few-Shot Classification

Hu, Yuqing, Gripon, Vincent, Pateux, Stéphane

arXiv.org Machine LearningJan-27-2020

In few-shot classification, the aim is to learn models able to discriminate classes with only a small number of labelled examples. Most of the literature considers the problem of labelling a single unknown input at a time. Instead, it can be beneficial to consider a setting where a batch of unlabelled inputs are treated conjointly and non-independently. In this paper, we propose a method able to exploit three levels of information: a) feature extractors pretrained on generic datasets, b) few labelled examples of classes to discriminate and c) other available unlabelled inputs. If for a), we use state-of-the-art approaches, we introduce the use of simplified graph convolutions to perform b) and c) together. Our proposed model reaches state-of-the-art accuracy with a $6-11\%$ increase compared to available alternatives on standard few-shot vision classification datasets.

arxiv preprint arxiv, dataset, learning, (13 more...)

arXiv.org Machine Learning

2001.09849

Country: Europe > France (0.04)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback